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Abstract 

We consider how to assign labels to any undirected graph with n nodes such that, given the labels 
of two nodes and no other information regarding the graph, it is possible to determine the distance 
between the two nodes. The challenge in such a distance labeling scheme is primarily to minimize 
the maximum label lenght and secondarily to minimize the time needed to answer distance queries 
(decoding). Previous schemes have offered different trade-offs between label lengths and query time. 
This paper presents a simple algorithm with shorter labels and shorter query time than any previous 
solution, thereby improving the state-of-the-art with respect to both label length and query time 
in one single algorithm. Our solution addresses several open problems concerning label length and 
decoding time and is the first improvement of label length for more than three decades. 

More specifically, we present a distance labeling scheme with labels of length -I- o{n) bits^ 

and constant decoding time. This outperforms all existing results with respect to both size and 
decoding time, including Winkler’s (Combinatorica 1983) decade-old result, which uses labels of size 
(log3)n and 0(n/logn) decoding time, and Gavoille et al. (SODA’Ol), which uses labels of size 
lln -I- o{n) and O(loglogn) decoding time. In addition, our algorithm is simpler than the previous 
ones. In the case of integral edge weights of size at most W, we present almost matching upper and 
lower bounds for the label size t. \(n — 1) log -I- l] < i < ^nlog {2W -I- 1) -I- 0(logn • log(nlT)). 
Furthermore, for r-additive approximation labeling schemes, where distances can be off by up to 
an additive constant r, we present both upper and lower bounds. In particular, we present an 
upper bound for 1-additive approximation schemes which, in the unweighted case, has the same size 
(ignoring second order terms) as an adjacency labeling scheme, namely n/2. We also give results for 
bipartite graphs as well as for exact and 1-additive distance oracles. 
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1 Introduction 


A distance labeling scheme for a given family of graphs assigns labels to the nodes of each graph from 
the family such that, given the labels of two nodes in the graph and no other information, it is possible 
to determine the shortest distance between the two nodes. The labels are assumed to be composed of 
bits. The main goal is to make the worst-case label size as small as possible while, as a subgoal, keeping 
query (decoding) time under control. The problem of finding implicit representations with small labels 
for specific families of graphs was first introduced by Breuer [13, 14], and efficient labeling schemes were 
introduced in [43, 51]. 

1.1 Distance labeling 

For an undirected, unweighted graph, a naive solution to the distance labeling problem is to let each 
label be a table with the n — 1 distances to all the other nodes, giving labels of size around nlogn 
bits. For graphs with bounded degree A it was shown [14] in the 1960s that labels of size 2nA can be 
constructed such that two nodes are adjacent whenever the Hamming distance [41] of their labels is at 
most 4A — 4. In the 1970s, Graham and Poliak [38] proposed to label each node with symbols from 
{0,1, *}, essentially representing nodes as corners in a “squashed cube”, such that the distance between 
two nodes exactly equals the Hamming distance of their labels (the distance between * and any other 
symbol is set to 0). They conjectured the smallest dimension of such a squashed cube (the so-called 
Squashed cube conjecture), and their conjecture was subsequently proven by Winkler [65] in the 1980s. 
This reduced the label size to \{n — 1) log 3], but the solution requires 0{n/ log n) query time to decode 
distances. Combining [43] and [50] gives a lower bound of [n/2] bits. A different distance labeling 
scheme of size of lln -I- o(n) and with O(log log n) decoding time was proposed in [36]. The article also 
raised it as open problem to find the right label size. Later in [63] the algorithm from [36] was modified, 
so that the decoding time was further reduced to 0(log* n) with slightly larger labels, although still of 
size 0{n). This article raised it as an open problem whether the query time can be reduced to constant 
time. Having distance labeling with short labels and simultaneous fast decoding time is a problem also 
addressed in text books such as [58]. Some of our are solutions are simple enough to replace material 
in text books. 

Addressing the aforementioned open problems, we present a distance labeling scheme with labels of 
size -|- o(n) bits and with constant decoding time. See Table 1 and Figure 1 for an overview. 


Space 

Decoding time 

Year 

Reference 

(log 3)n 

0(n/ log n) 

1972/1983 

[38, 65] 

lln 

0(log logn) 

2001 

[36] 

cn, c > 11 

0(log* n) 

2011 

[63] 


0(1) 

2015 

this paper 


Table 1; Unweighted undirected graphs. Space is listed presented without second order terms. A 
graphical presentation of the results is given in Figure 1 

Distance labeling schemes for various families of graphs exist, e.g., for trees [5, 55], bounded tree- 
width [36], distance-hereditary [34], bounded clique-width [21], some non-positively curved plane [18], 
interval [35] and permutation graphs [10]. In [36] it is proved that distance labels require 0(log^n) 
bits for trees, 0{^/nlogn) and bits for planar graphs, and 0,{^/n) bits for bounded degree 

graphs. In an unweighted graph, two nodes are adjacent iff their distance is 1. Hence, lower bounds 
for adjacency labeling apply to distance labeling as well, and adjacency lower bounds can be achieved 
by reduction [43] to induced-universal graphs, e.g. giving ^ and j for general and bipartite graphs, 
respectively. An overview of adjacency labeling can be found in [7]. 
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Space 


0(n) 


• 2011 


lln 


• 2001 


(log 3)n 


• This paper 


• 1972/1983 


-^ Timp 

0(1) 0(log*n) O(loglogn) 0(n/log n) 


Figure 1: A graphical representation of the results from Table 1. 

Various computability requirements are sometimes imposed on labeling schemes [2, 43, 45]. This 
paper assumes the RAM model and mentions the time needed for decoding in addition to the label size. 

1.2 Overview of results 

For weighted graphs we assume integral edge weights from [1, VF]. Letting each node save the distance 
to all other nodes would require a scheme with labels of size 0{n\og{nW)) bits. Let distG{x,y) denote 
the shortest distance in G between nodes x and y. An r-additive approximation scheme returns a value 
dist/j(x, y), where distG(x,?/) < dist/j(x,?/) < distG(x, y) + r. 

Throughout this paper we will assume that log IF = o(logn) since otherwise the naive solution 
mentioned above will be as good as our solution. Ignoring second order terms, we can for general 
weighted graphs and constant decoding time achieve upper and lower bounds for label length as stated 
in Table 2. For bipartite graphs we also show a lower bound of |n log [2IF/3 + 5/3j and an upper 
bound of whenever IF = 1. 


Problem 

Lower bound 

Upper bound 

General graphs 

i(n- l)log riF/2 + 1] 

|nlog(2IF + 1) 


Table 2; General graphs with weights from [1,IF], where logVF = o(logn). The upper bound has an 
extra o(n) term, and decoding takes constant time. 


We present, as stated in Table 3, several trade-offs between decoding time, edge weight IF, and 
space needed for the second order term. 


Time 

Second order term 

W 

N/A 

0(logn • log(nlF)) 

Any value 

0{n) 

0(log^ n) 

0(1) 

0(1) 

0{j^^og{2W + l)(log log n Flog IF)) 

2o(logn) 


Table 3; Second order term for the upper bound for general graphs (in Table 2). The results also hold 
for the n/2 labels in the unweighted, bipartite case. It may be possible to relax the restriction IF = 0(1) 
if the word ’’finite” in Lemma 2.2 below from [24] does not mean “constant”. 


We also show that, for any k,D > 0 with logk = o(logn) and D < 2{k + 1)IF — 1, there exists a 
(2A:IF + |~27FPWF^1 )-additive distance scheme using labels of size log(2(/c + 1)IF + 1 — O) + 


0(logn ' 


2{k+l)W-D 

og(nlF)) bits. 
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Finally, we present lower bounds for approximation schemes. In particular, for r < 2W we prove 
that labels of n(nlog {W/{r + 1))) bits are required for an r-additive distance labeling scheme. 

1.3 Approximate distance labeling schemes and oracles 

Approximate distance labeling schemes are well studied; see e.g., [36, 39, 40, 55, 62]. For instance, 
graphs of doubling dimension [59] and planar graphs [60] both enjoy schemes with polylogarithmic label 
length which return approximate distances below a 1 + e factor of the exact distance. Approximate 
schemes that return a small additive error have also been investigated, e.g. in [17, 33, 48]. In [32], lower 
and upper bounds for r-additive schemes, r < 2, are given for chordal, AT, permutation and interval 
graphs. For general graphs the current best lower bound [32] for r > 2-additive scheme is II(\/u/r). 
For r = 1, one needs bits since a 1-additive scheme can answer adjacency queries in bipartite graphs. 
Using our approximative result, we achieve, by setting k = 0 and D = W = 1, a. 1-additive distance 
labeling scheme which, ignoring second order terms, has the same size (namely bits) as an optimal 
adjacency labeling scheme. Somehow related, [11] studies labeling schemes that preserve exact distances 
between nodes with minimum distance P, giving an 0((n/P) log^ n) bit solution. 

Approximate distance oracles introduced in [62] use a global table (not necessarily labels) from 
which approximate distance queries can be answered quickly. One can naively use the n labels in a 
labeling scheme as a distance oracle (but not vice versa). For unweighted graphs, we achieve constant 
query time for 1-additive distance oracles using -|- o(n^) bits in total, matching (ignoring second 
order terms) the space needed to represent a graph. Other techniques only reduce space for r-additive 
errors for r > 1. For exact distances in weighted graphs, our solution achieves log (2IF -|- 1) -|- o(n^) 
bits for logVF = o(logn). This relaxes the requirement of IF = 0(1) in [28] (and slightly improves the 
space usage in that paper). 

1.4 Second order terms are important 

Chung’s solution in [19] gives labels of size logn -|- O(loglogn) for adjacency labeling in trees, which 
was improved to logn -|- 0(log* n) in [9] and in [12, 29, 30, 44] to logn -|- 0(1) for various special cases. 
A recent STOC’15 paper [7] improves label size for adjacency in generel graphs from n/2 -|- O(logn) 
to n/2 + 0(1). Likewise, the second order term for ancestor relationship is improved in a sequence of 
STOC/SODA papers [2, 8, 4, 30, 31] (and [1]) to ©(loglogn), giving labels of size logn -|- ©(loglogn). 

Somewhat related, succinct data structures (see, e.g., [24, 26, 27, 52, 53]) focus on the space used 
in addition to the information theoretic lower bound, which is often a lower order term with respect to 
the overall space used. 

1.5 Labeling schemes in various settings and applications 

By using labeling schemes, it is possible to avoid costly access to large global tables, computing instead 
locally and distributed. Such properties are used, e.g., in XML search engines [2], network routing and 
distributed algorithms [22, 25, 61, 62], dynamic and parallel settings [20, 47], graph representations [43], 
and other applications [45, 46, 54, 55, 56]. From the SIGMOD, we see labeling schemes used in [3, 42] 
for shortest path queries and in [16] for reachability queries. Finally, we observe that compact 2-hop 
labeling (a specific distance labeling scheme) is central for computing exact distances on real-world 
networks with millions of arcs in real-time [23]. 

1.6 Outline of the paper 

Section 3 illustrates some of our basic techniques. Sections 4 and 5 present our upper bounds for 
exact distance labeling schemes for general graphs. Section 6 presents upper bounds for approximate 
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distances. Our lower bounds are rather simple counting arguments with reduction to adjacency and 
have been placed in Appendix A. 

2 Preliminaries 

Trees. Given a rooted tree T and a node u of T, denote by be the subtree of T consisting of all 
the descendants of u (including itself). The depth of u is the number of edges on the unique simple 
path from u to the root of T. For any rooted subtree A of T, denote by root(A) the root of A, as the 
node of A with smallest depth. Denote by A* = A \ {root(A)} the forest obtained from A by removing 
its root. Denote by |A| the number of nodes of A: hence, |A*| represents its number of edges. Denote 
by parent 2 ’(rt) the parent of the node u in T. Let T[u,v] denote the nodes on the simple path from 
rt to u in T. The variants T(u,v] and T[u,v) denote the same path without the first and last node, 
respectively. 

Graphs. Throughout we assume graphs to be connected. If a graph is not connected, we can add 
0(log n) bits to each label, indicating the connected component of the node, and then handle components 
separately. We denote by distG(rt, u) the minimum distance (counted with edge weights) of a path in 
G connecting the nodes u and v. 

Representing numbers and accessing them. We will need to encode numbers with base different 
from 2 and sometimes compute prefix sums on a sequence of numbers. We apply some existing results; 

Lemma 2.1 ([49]). A table with n integral entries in [—/c,/c] can he represented in a data structure of 
0(n log/c) bits to support prefix sums in constant time. 

Lemma 2.2 ([24]). A table with n elements from a finite alphabet a can be represented in a data 
strueture of [nloglal] hits, sueh that any element of the table ean he read or written in constant time. 
The data strueture requires O(logn) preeomputed word constants. 

Lemma 2.3 (simple arithmetic coding). A table with n elements from an alphabet a can he represented 
in a data structure of [nlog |cj|] bits. 

3 Warm-up 

This section presents, as a warm-up, a distance labeling scheme which does not achieve the strongest 
combination of label size and decoding time, but which uses some of the techniques that we will employ 
later to achieve our results. For nodes x, u, v, define 

Sx{u, v) = distG(a:, v) — distG(a:, u). 

Note that the triangle inequality entails that 

— distG(n, u) < Sx{u,v) < distG(n, u). 

In particular, Sx{u,v) G [—IF, IF] whenever u,v are adjacent. 

Given a a path vq, ... ,vt of nodes in G, the telescoping property of d^j-values means that 

t 

dx{vo,vt) = '^5x{vi-i,Vi). 
i=l 

Since Vi-i and Vi are adjacent, we can encode the d^-values above as a table with t entries, in which 
each entry is a an element from the alphabet [—IF, IF] with 2IF + 1 values. Using Lemma 2.3 we can 
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encode this table with \t\og{2W + 1)] bits. Note that we can compute distG(x,ut) from distG(a:,uo) by 
adding a prefix sum of the sequence of (5a;-values: 


t 

distG(x,ut) = distG(x,uo) + '^S^{vi-i,Vi). 

i=l 

The Hamiltonian number of G is the number h{G) of edges of a Hamiltonian walk in G, i.e. a closed 
walk of minimal length (counted without weights) that visits every node in G. It is well-known that 
n < h{G) < 2n — 2, the first inequality being an equality iff G is Hamiltonian, and the latter being an 
equality iff G is a tree (in which case the Hamiltonian walk is an Euler tour); see [15, 37]. 

Consider a Hamiltonian walk vq, ..., Vh-i of length h = h{G). Given nodes x, y from G, we can find 
i,j such that x = Vi and y = Vj. Without loss of generality we can assume that i < j. H j < i + /i/2, 
we can compute distG(a::,y) as the sum of at most [/i/2j (5a;-values: 


1-1 

distG{x,y) = distG{vi,Vj) = '^5xivk,Vk+i). 

k=i 

If, on the other hand, j > i + h/2, then we can compute distG{x,y) as the sum of at most [h/2\ 
(5j^-values: 

i—1 

distG(x,y) = distG(i;j,fi) = '^Sy{vk,Vk+i), 

k=j 

where we have counted indices modulo h in the last expression. This leads to the following distance 
labeling scheme. For each node x in G, assign a label (.{x) consisting of 

• a number i G [0, /i — 1] such that x = vy, and 

• the [/i/2j values Sx{vk-,VkJ^i) for A: = i,...,i -|- [/i/2j — 1 (mod h). 

From the above discussion it follows that the labels l{x) and (.{y) for any two nodes x, y are sufficient 
to compute distG(a:,y). 

We can encode l{x) with [i/ilog(2IT -|- 1)] -|- [log/i] bits using Lemma 2.3. If G is Hamiltonian, 
this immediately gives a labeling scheme of size [^nlog(2IF -|- 1)] -|- [logn]. In the general case, we get 
size \{n — l)log(2IT -|- 1)] -|- [logn], which for W = 1 matches Winkler’s [65] result when disregarding 
second order terms. Theorem 4.1 in the next section shows that it is possible to obtain labels of size 
^nlog(2IT -|- 1) -|- 0(logn • log(nlT)) even in the general case. Theorem 5.3 in the section that follows 
shows that we can obtain constant time decoding with o(n) extra space. 

4 A scheme of size |nlog(2VK -H 1) 

We now show how to construct a distance labeling scheme of size ^nlog(2IT -|- 1) -|- 0(logn • log(nlT)). 

First, we recall the heavy-light decomposition of trees [57]. Let T be a rooted tree. The nodes of T 
are classified as either heavy or light as follows. The root r of T is light. For each non-leaf node u, pick 
one child w where jT^,] is maximal among the children of v and classify it as heavy; classify the other 
children of v as light. The apex of a node v is the nearest light ancestor of v. By removing the edges 
between light nodes and their parents, T is divided into a collection of heavy paths. Any given node v 
has at most logn light ancestors (see [57]), so the path from the root to v goes through at most logn 
heavy paths. 

Now, enumerate the nodes in T in a depth-first manner where heavy children are visited first. Denote 
the number of a node v by dfs(u). Note that nodes on a heavy path will have numbers in consecutive 
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order; in particular, the root node r will have number dfs(r) = 0, and the nodes on its heavy path will 
have numbers 0,1,.... Assign to each node v a label It{v) consisting of the sequence of dfs-values of 
its first and last ancestor on each heavy path, ordered from the top of the tree and down to v. Note 
that the first ancestor on a heavy path will be the apex of that heavy path and will be light, whereas 
the last ancestor on a heavy path will be the parent of the apex of the subsequent heavy path. This 
construction is similar to the one used in [6] for nearest common ancestor (NCA) labeling schemes, 
although with larger sublabels. Indeed, the label (.t{v) is a sequence of at most 21ogn numbers from 
[0,n[. We can encode this sequence with O(log^n) bits. 

Suppose that the node v has label (-Tiy) = ... ,lt, ht), where li = dfs(r) = 0 and ht = dfs(u) 

and where li,hi are the numbers of the first and last ancestor, respectively, on the i’th heavy path 
visited on the path from the root to v. Since nodes on heavy paths are consecutively enumerated, it 
follows that the nodes on the path from the root to v are enumerated 

0 = li,... ,hi,l2, ■ ■ ■ ,h2, ■ ■ ■ ,lt, ■ ■ ■ ,ht, 

where duplicates may occur in the cases where li = hi, which happens when the first and last ancestor 
on a heavy path coincide. 

In addition to the label iriv), we also store the label i'rp{v) consisting of the sequence of distances 
distG(^j,u) and distG(/ii, u). This label is a sequence of at most 21ogn numbers smaller than nW, and 
hence we can encode with Oilogn ■ log(nW)) bits. Combined, iriv) and can be encoded 

with 0(logn • log(nlT)) bits. 

Now consider a connected graph G with shortest-path tree T rooted at some node r. Using the 
above enumeration of nodes, we can construct a distance labeling scheme in the same manner as in 
Section 3, except that instead of using a Hamiltonian path, we use the dfs-enumeration of nodes in T 
from above, and we save only (5a;-value between nodes and their parents, using [inlog(2IT + 1)] bits 
due to Lemma 2.3. More specifically, for each node x, we assign a label l{x) consisting of 

• the labels irix) and £t{x) as described above; and 

• the [n/2j values (5^;(parent(u), u) for all v with dfs(x) < dfs(u) < dfs(x) + [n/2j (mod n). 

We can encode the above with ^nlog{2W + 1) + 0(logn • log(nlU)) bits. 

Given nodes x / y, either £(x) will contain (5a;(parent(y), y) or i{y) will contain (5^^(parent(x), x). 
Without loss of generality, we may assume that i{x) contains (5a;(parent(y), y). Let z denote the nearest 
common ancestor of x and y. Note that z must be the last ancestor of either x or y on some heavy path, 
meaning that dfs( 2 ;) appears in either £t{x) or iriy)- By construction of depth-first-search, a node v 
on the path from (but not including) z to (and including) y will have a dfs-number dfs(u) that satisfies 
the requirements to be stored in i{x). Thus, i{x) must, in fact, contain (5a;-values for all nodes in T(^z,y]- 

Next, note that, since T is a shortest-path tree, dist( 3 '(x, z) = distr(x, z). Now, if z appears in irix), 
we can obtain distr(x,z) directly from ^^(x); else, z must appear in iriv), and we can then obtain 
distTiz,y) from £'j^{y) and compute dist'r(x, 2 :) = dist'r(x,r) — dist'r(r, y) -|- dist'r(2:, y). In either case, 
we can now compute the distance in G between x and y as 

dist( 3 '(x, y) = distG(x, z)-|- (5^;(parent(u), u). 

■v&T(z,y] 

The label of x contains all the needed (5a;-values, and It{x) and iriy) combined allows us to determine 
the dfs-numbers of the nodes on T{z, y], so that we know exactly which (5j;-values from x’s label to pick 
out. Thus we have proved: 

Theorem 4.1. There exists a distanee labeling scheme for graphs with label size ^nlog(2IU -|- 1) + 
0(logn • log(nlU)). 
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This gives us the first row of Table 3. To obtain the second row, we encode the 5x values with 
Lemma 2.2. Doing this we can access each value in constant time and simply traverse in 0{n) time 
the path from y to z, adding 5^ values along the way. Note, however, that Lemma 2.2 only applies for 
W = 0(1). Saving the (5j;-values in a prefix sum structure as described in Lemma 2.1, we can compute 
the sum using logn look-ups. The next section describes how we can avoid spending O(logn) time (or 
more) on this, while still keeping the same label size. 

For unweighted (IF = 1), bipartite graphs, (5a;-values between adjacent nodes can never be 0, which 
means that we only need to consider two rather than three possible values. Thus, we get label size 
+ O(log^n) instead in this case. We shall give no further mention to this in the following. 

5 Constant query time 

Let T be any rooted spanning tree of the connected graph G with n nodes. We create an edge-partition 
T = {Ti, T 2 ,...} of T into rooted subtrees, called micro trees. Each micro tree has at most /3 edges, and 
the number of micro trees is |T| = 0{n/(5). We later choose the value of (5. For completeness we give a 
proof (Lemma B.l) in the appendix of the existence of such a construction. Observe that the collection 
forms a partition of the nodes of T*. As the parent relationship in Tj coincides with the one of 
T, we have parentj^, (u) = parent'r(u) for all u gT*. 

For every node u G T*, we denote by i{u) the unique index i such that u G T*. For a node u of T* 
we let MicroRoot(tt) = root(Tj(„)), and for r = root(T) let MicroRoot(r) = r. 

Define the macro tree M to have node set {MicroRoot(ii) \ u G G} and an edge between 
MicroRoot(u) and MicroRoot(MicroRoot(u)) for all u ^ r. 

By construction, M has Oinjfi) nodes. 

Our labeling scheme will compute the distance from x to y as 

distG'(x, y) = distG'(a:, r) -|- 5x{r, MicroRoot(y)) + (5a;(MicroRoot(y), y). 

The first addend, distG'(a:, r), is saved as part of x’s label using logn -|- log IF bits. The second addend 
can be computed as a sum of (5a;-values for nodes in the macro tree and is hence referred to as the macro 
sum. The third addend can be computed as a sum of (5a;-values for nodes inside y’s micro tree and is 
hence referred to as the micro sum. The next two sections explain how to create data structures that 
allow us to compute these values in constant time. 

5.1 Macro sum 

Consider the macro tree M with 0{n/(3) nodes. As mentioned in Section 3 there exists a Hamiltonian 
walk uo,...,u/i_i of length h = 0{n/f3), where we can assume that vq = r. Given nodes x,y G G, 
consider a path in M along such a Hamiltonian walk from r to MicroRoot(y). This is a subpath 
vq, ... ,vt of the Hamiltonian walk, where t is chosen such that vt = MicroRoot(y). Note that 

t-i 

(5a;(r, MicroRoot(y)) = 5xivo,vt) = '^5xivi,Vi+i). 

i=0 

Since each edge in M connects two nodes that belong to the same micro tree, and the distance within 
each micro tree is a most /31F, we have that 5x{vi,Vi+i) G [—PW, /31F] for all i. Using Lemma 2.1 we can 
store these (5a;-values in a data structure, PreFiXa,, of size 0{{n/f3)log{2/3W + 1)) = 0{nlog{f3W)//3) 
such that prefix sums can be computed in constant time. This data structure is stored in x’s label. 
An index t with vt = MicroRoot(y) is stored in y’s label using 0(log(n//3)) bits. These two pieces of 
information combined allow allows us to compute (5a;(r, MicroRoot(y)) for all y. 

Label summary: For a (pre-selected) Hamiltonian walk vq, ... ,Vh-i in M, we store in the label 
of each node x a datastructure PreFiXa, of size 0{n\og{j3W)/(3) such that prefix sums in the form 



fi+i) can be computed in constant time. In addition, we store in the label of x an index 
m{x) such that Vm(x) = MicroRoot(x), which requires 0(log(n//3)) bits. 

5.2 Micro sum 

For any node v 7 ^ r, define 

^xiv) = (5a:(parent'r(u),u) 

Note that, for a node y gT*, (ia;(MicroRoot(?/), y) is the sum of the values Sx{vj) for all nodes Vj G T* 
lying on the path from MicroRoot(y) to y. Each of these (ix-values is a number in [—W, W]. 

For each i, order the nodes in T* in any order. For each node x and index i, let 5x{T*) = 
■ ■ ■ ,Sx{v\T*\))i where ui,... is the ordered sequence of nodes from T*. We will construct 

our labels such that x’s label stores Sx(T*) for half of the total set of delta values (we will see how 
in the next section), and such that y’s label stores information about for which j’s the node vj lies 
on the path between MicroRoot(y) and y. With these two pieces of information, we can compute 
(5a;(MicroRoot(y), y) as described above. 

We define f{W) = 2W + 1. The sequence Sx{T*) consists of \T*\ values from [—IF, IF] can be 
encoded with \T*\ [log/(IF)] bits. To store this more compactly, we will use an injective function, as 
described in Lemma 2.3 that maps every sequence of t integers from [—IF, IF] into a bit string of length 
[flog/(IF)]. Denote by code((ia;(T[*)) such an encoding of the sequence 6x{T*) to a bit string oflength 
\\T*\logf{W)-] < [/3log/(IF)], as \T*\ </3 

In order to decode the encoded version of Sx{T*) in constant time, we construct a tabulated inverse 
function code~^. From the input and output sizes, we see that we need a table with entries, 

for each of the j3 possible micro tree sizes, and each result entry having /? [log/(IF)] bits, giving a total 
space of [log/(IF)] bits. 

Let Tj = . Let &: be the bitwise AND operator. In node y’s label we save the bit string mask(y) 

such that mask(y) & Sx{T*) gives an integer sequence S identical to 5x{T*), except that the integer 
Sx{v) has been replaced by 0 for all v that are not an ancestor of y. Given S we can now compute the 
micro sum 5a;(MicroRoot(y), y) as the sum of integers in the sequence S. We will create a tabulated 
function that sums these integers, Sumintegers. Sumintegers is given a sequence of up to 13 values in 
[—IF, IF], and the output is a number in [—/3IF, j3W]. We can thus tabulate Sumintegers as a table with 
/I 2 ^riog/('^)l entries each of size [log/(/IIF)], giving a total space of / 32 ^riog/(i^)l [log/(/IIF)]. 

Both functions, code~^ and Sumintegers, have been tabulated in the above. A lookup in a tabulated 
function can be done in constant time on the RAM as long as both input and output can be represented 
by O(logn) bits. We can achieve this by setting 

clogn 

^ - [log/(IF)] 

for a constant c. To see this, note that the maximum of the four input and output values above is 
[log/3] + /3 [log/(IF)]. Using the above inequality then gives loglogn + clogn = O(logn). 

The tables for the tabulated functions are the same for all nodes. Hence, in principle, assuming an 
upper bound for n is known, we could encode the two tables in global memory, not using space in the 
labels. However, as we will see, the tables take no more space than the prefix table PreFix^,, so we can 
just as well encode them into the labels. Doing that we use an additional l 32 '^ 3 '^oz{f{W)Y\j^ [log/(IF)] 
for the code“^ table and / 32 ^riog/(W")l [log/(IF/3)] for the Sumintegers table. Using that IF = o{n) and 
substituting (3 for the above expression then gives, after a few reductions, that the extra space used is 
no more than O{{log n)‘^rf') bits. Since the prefix table uses at least ”' ) bits, we see that the 

added space does not (asymptotically) change the total space usage, as long as we choose c < 1. 

Label summary: We will construct the labels such that either x's label contains 5x{T*{y)) or vice 
versa (we shall see how in the next section). Using the tabulated function code“^, the bits in Sx{T*{y)) 
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can be extracted in constant time from x’s label. Using mask(y) from y’s label and the tabulated 
function Sumintegers, we can then compute (5a;(MicroRoot(y), y) in constant time. The total space used 
for all this is no more than 


5.3 Storing and extracting the deltas 


Let the micro trees in T bee given in a specific order: ri,...,T|j|. Let D{x) = 

code((5a;(T^*)) • • • code((5a;(Tj^|)) denote the binary string composed of the concatenation of each string 
code((5a;(Tj*)) in the order i = 1,2,..., |T|. 

Let L = \D{x)\ be the length in bits of D{x). Let pi G [0, L) be the position in the string D{x) where 
the substring code{Sx{T*)) starts. E.g., Il(x)[0] = D{x)\pi] is the first bit of code((5a;(Ti*)), D{x)\p 2 ] 

the first bit of code((5a;(r2*)), and so on. According to Lemma 2.3 we have pi = Ylj<i l^/l log/(IT) 


Observe that the position pi only depends on i and W and not on x. 

We denote by a{y) and a'{y) the starting and ending positions of the substring code((5x(T*^^ )) in 
D{x). More precisely, a{y) = pn^y) and a'{y) = Pi{y)+i - 1, so that |code((5a;(T*^)))| = a'{y) - a{y) + 1. 
For each node y we use O(logn) bits to store a(y) and a'{y) in its label. 

For a node x we will only save approximate half of D{x), in a table H{x). H{x) will start 
with code((5a;(T.t 0) and the code for the following micro trees in the given circular order un- 
til ii-{x) in total has at least n/2 bx values, but as few a possible. In other words H{x) = 
code((5a;(rj*(^))) • • • code{5x{T*^^-^)) where the indexes i{x),i{x) + 1 ,... , j(a;) may wrap to 1 after reaching 
the largest index |T| iij{x) < i{x). Let h{x) =Pj(x)+i- 

In a node x’s label we save a(x), a'(x), b(x) and L using O(logn) bits. Having those values we know 
which Sx values from D{x) are saved in x’s label as well as the position of them in H{x). Furthermore 
we know the position of the (5a;-values of x’s own micro tree in D{x). We will need to extract at most 
\/3 log f{W)~\ = O(logn) consecutive bits from H{x) in one query. On the word-RAM this can be done 
in constant time. 


Proposition 5.1. Let x,y be two nodes ofG. Then, 

(i) \H{x)\ = in log/(IT) log/(IT)); and 

(ii) either code{5xiT*^y-^)) is part of H{x) or codo{5y{T*^^^^)) is part of H{y). 


Proof. Let T' be the subset of T encoded in H{x). We have: 

\H{x)\ = Y.\\T*\logf{W)] < Y.i\T*\logf{W) + l) 

Tie7' TiGT' 

< ^nlog f {W)+ \7\ + \j3log f{W)] < ^nlogf{W)+ 0{n/j3 + j3logf{W)) 

1 Ti 

< i^nlogfiW) + 0{- -log/(IU)) 

2 logn 

which proves (i). Part (ii) follows from the fact that x saves at least half of the Sx’s in a cyclic order. 
If y not is include here, x must be included in the (5y-values saved by y. □ 

5.4 Summary 

The label of x is composed of the follows items. 

1. The values a(x), a'{x), mask(x), m(x), distG'(x,r), L and 6 (x): O(logn). 

2. A prefix table, PreFiXa,, for the values in the macro tree: 0(j^^((log/(IU))^-|-loglognlog/(IU))). 
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3. The table i?(x): inlog/(lT) + 0(j^ log/(TT)). 

4. Global tables, code“^ and Sumintegers of size 

Note that L and the global tables are common to all the nodes. In addition we may need to use 
O(logn) bits to save the start position in the label for the above constant number of sublabels. 

Lemma 5.2. Every label has length at most log/(IT) + W + log log n log/(IT))) bits. 

Let Decode{£{x, G),i{y,G)) denote the distance returned by the decoder given the labels of x and 
of y in G. It is defined by: 


Decode(.^(x, G),i{y, G)): 

1. If {a{x) < a{y) < b{x)) V {b{x) < a{x) < a{y)) V (a(y) < b{x) < a{x)) then s = a{y) — a{x) 
(mod L) and e = a'{y) — a(x) (mod L) 

2. Else return DECODE(£(y, G), £(x, G)) 

3. MacroSum = PreFiXa;(m(y)) 

4. 5 = code“^(i?a;['S, • • • ,e]) Sz mask(y) 

5. MicroSum = Sumlntegers(5) 

6. Return distG(x, r) + MicroSum + MacroSum 


Theorem 5.3. There exists a distance labeling scheme for graphs with edge weights in [1, IT] using 
labels of length ^nlog(2IT + 1) + 0(j^^log(2IT + l)(logIT + log log n)) bits and constant decoding 
time. 


6 Approximate distances 


By considering only a subset of nodes from G and using the previous techniques, it is possible to create 
an approximation scheme where the label size is determined by a smaller number of nodes but with 
larger weights between adjacent nodes. We leave the details for Appendix C.l and present here only 
the result. 

Theorem 6.1. There exists a {2kW)-additive distance labeling scheme for graphs with n nodes and 
edge weights in [1, IT] using labels of size 2 (^+ 1 ) ^log(^(fe + 1)IT + 1) + 0(logn • log(nlT)). 

Another way to achieve an approximation scheme is to use a smaller set of weights while keeping 
the accumulated error under control. This leads to the following result whose proof can be seen in 
Appendix C.2. 

D 


Theorem 6.2. For any D < 2IT — 1 there exists a 


2W-D 


-additive distance labeling scheme for graphs 


with n nodes and edge weights in [1, IT] using labels of size |nlog(2IT + 1 — H) + 0(logn • log(nlT)). 

One instance of Theorem 6.2 is D = IT, which gives a 1-additive distance labeling scheme of size 
inlog(IT -|- 1) -|- o{n). For D = 2IT — 1 we get a (2IT — l)-additive distance labeling scheme of size 
-|- o(n). For constant r the above technique also applies to our constant time decoding results. For 
unweighted graphs this implies that we can have labels of size -|- o(n) with a 1 -additive error and 
constant decoding time. 

By combining the above two theorems, we obtain the theorem below; see Appendix C.3. 

D 


Theorem 6.3. For any k > 0 and D < 2{k -|- 1)IT — 1 there exists a {2kW -|- | 2 (k-\-i)W-D 
additive distance labeling scheme for graphs with n nodes and edge weights in [1,IT] using labels of 
size 2 (fc+i) ^log( 2 (fc -I- 1)IT 1 — D) 0(logn • log(nlT)) bits. 
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APPENDIX 


A Lower bounds 


Our lower bound technique can be seen as a generalization of the classical counting argument for 
adjacency labeling schemes. Indeed, for r = 0 and W = 1, our formula yields (n — l)/2 bits, which is 
exactly the number of bits needed for adjacency. The lower bound we develop here is well-suited for 
small additive errors r. In particular, when r < 2W we prove that labels of n(nlog {W/{r + 1))) bits 
are required for an r-additive distance labeling scheme. 

Given an unweighted graph B and an integer IT > 1, denote by 3^w{B) be the family of all subgraphs 
of B whose edges are weighted by values taken from [1, IT]. 


Theorem A.l. Let B be an unweighted graph with n vertiees, m edges and girth at least g, and let r, IT 
be integers such that r G [0, {g — 2)IT). Then, every r-additive approximate distance labeling scheme for 
Tw[B) requires a total label length of at least mlog {k + 1), and thus a label of at least (m/n) log {k + 1) 
bits, where 


k 


g-2 

_g- 1 


IT 

r -|- 1 



Proof. An r-approximate distance matrix for a weighted graph G with vertex-set [l,n] is an n x n 
matrix M such that distG{x,y) < M[x,y] < distG{x,y) + r for all vertices x,y of G. 

The basic idea of our lower bound technique is to show that TwiB) contains a large set S of weighted 
graphs for which no two graphs can have the same r-approximate distance matrix. A crude observation 
is that an r-approximate distance matrix for each graph of S can be generated from the ordered list of 
all the labels provided by any r-additive approximate labeling scheme for S- So, it turns out that, by 
a simple counting argument, the total label length must be at least log |S|. In particular, the labeling 
scheme must assign, for some vertex of some graph of S, a label of at least (log|S|)/n bits. We now 
construct such a set S with |S| = {k + I)™. 

For the shake of the presentation, define IT* = IT — (/c — i — l)(r -|- 1) for i = 0 ,..., A:. Note that the 
ITjS increase with i, and more precisely that IT+i = IT -|- r -|- 1. Moreover, we observe that: 

Claim A.2. The following hold: k > 1, ITq, ..., IT_i G [1, IT], and IT < {g — l)ITo. 

Before we give a formal proof of Claim A.2 (which is a basic calculation), we explain how to derive 
our lower bound. 

Consider the set C of all edge-colorings of B into k-\-l colors. More precisely, an edge-coloring c G S 
is simply a function c : E{B) —)■ [0,/c] mapping to each edge e of B some integer c(e) G [0, A:]. Clearly, 
|C| = {k d- l)"^ since each of the m edges of B can receive A: -|- 1 distinct values. 

With each coloring c G G, we associate a weighted graph G with edge-weight function w obtained 
from graph B by testing the color of each edge xy of B. If c{xy) = k, the edge is deleted. And, if 
c{xy) = i < k, we keep xy in the graph and set w{xy) = IT. We denote by S the family of graphs 
constructed by this process from all the colorings of C. It is clear that, given B, one can recover from 
G and w the initial coloring c (just scan all the possible edges of B, check if they exist in G and look 
at their weights). In other words the construction is bijective and thus |S| = |C| = {k d- 1)™. 

By construction, each graph of S is a subgraph of B. Moreover, by Claim A.2, each weight is some 
integer IT £ [1, IT] as i G [0, A — I] (edges of color k have been removed). In other words, S T Tw{B). 
It remains to prove that any two graphs of S cannot have the same r-approximate distance matrix. The 
intuition is that two graphs of S differ only when there is an edge xy in B whose color is different in the 
two graphs. Because of the choice of the edge-weights, the distance between x and y in the two graphs 
must, as we shall see, differ by at least r -|- 1. 

Let G,G' be two distinct weighted graphs of S- Denote by w,w' their respective edge-weighting 
functions, and by M, M' any r-approximate distance matrices for G and G' respectively. As we will 
show, if G and G' are different, there must exist an edge xy of B, and two colors i,j G [0, A] , i < j, 
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such that distG{x,y) < Wi and distG''(x,y) > Wj (the case distG'/(x,y) < Wi and distG(a:^, 2 /) > Wj is 
symmetric). For this purpose, we consider two cases: 

(i) The graphs G,G' are different because there is an edge xy of B which is in G but not in G'. 

We have xy G E{G), which implies that distG(a:,y) < vj{xy) < since the color of xy in G is 

< k. Further, xy ^ E{G') implies that distG'(a:,y) > {g — l)lFo, since any path from x to y in G' 
contains at least y — 1 edges {G' is a subgraph of B which has girth at least g), and the minimum weight 
assigned to any edge is Wq. (Note that this holds, in particular, when x and y are unconnected and 
distG'(x, y) = oo.) Thus from Claim A. 2 , distG"(x, y) > Wk- So the claim holds for i = /c — 1 and j = k. 

(ii) The graphs G, G' are different because there is an edge xy in G and G' with different weights. 
Assuming that w{xy) < w'{xy), there must exist i,j such that w{xy) = Wi and w'{xy) = Wj. Note 
that i < j < k. We have distG(a:,y) = Wi and distG''(x,y) = Wj since we have seen in the previous 
case that every path from x to y and excluding the edge xy has cost at least (y — l)Wo. By Claim A.2, 
W, < Wj <Wk<{g- l)fFo. 

In both cases we have found i,j G [0,A:], f < j, such that distG'(x,y) < Wi and distG'(x,y) > Wj. 
Now, by definition of M and M', M[x,y] < distG(x,y) + r < IFj + r and Wj < distG''(x,y) < M'[x,y]. 
Since Wj > ITj+i = ITj + r + 1, we conclude that M[x,y] < M'[x,y] proving that no two different 
graphs of S can have the same r-approximate distance matrix. 

To complete the proof, it remains to prove Claim A.2. Let us first show that A: > 1 (which is required 
since in the proof we use for instance that Wq < Wk-i). Recall that r < {g — 2)W — 1, 


y — 2 

f IT \ 


-—- 

“h 1 ) 

> 

_y -1 

vr + l y. 



Let us show that Wq > 1. Since Wq 
We have. 


y-2 / W 

^[(g-2)W 


+ 1 


g-2 
g-iyg-2 


+ 1 


= 1 . 


= W— (k — l)(r + l), it suffices to check that (/c — l)(r +1) < W. 


(k - l)(r + 1 ) 


y -2 

_y -1 


w 

r + 1 



• (r + 1 ) 


< 


y -2 
y- 1 


w 

r + 1 


• (r + 1) < W 


since the girth y is always at least three. 

Now we have Wq, ... ,Wk-i G [1,IF], since the WiS are non-decreasing, Wq > 1, and Wk-i = 
W -{k-{k-l)- l)(r + 1) = W. 

Let us show that Wk < (y — l)ILo. We have Wk = IT-|-r-|-l and Wq = W — {k — l){r +1). Therefore, 


Wk<{g-l)WQ ^ 
<t» 

<t» 


IT + r + 1 < (y - 1)(IT - (A: - l)(r + 1)) 
r -|- 1 -|- (y — 1)(A — l)(r -|- 1) < (y — 2)IT 

(y - 1)(A: - 1) < (y - 2 )^ 7 ^ - 1 


4^ k < 


y-2 IT 


y -1 r+l g-l 


+ 1 = 


IT 


y -2 
y — 1 -|- 1 


+ 1 


The latter equation is true by the choice of k. This completes the proof of Claim A.2 and of Theorem A.l. 


□ 


A collection of corollaries to Theorem A.l can be seen in Table 4. 

The case r > 2 and IT = 1 is out of the range of our lower bound, as long as we choose for B a 
graph with m = 0(n^) edges. Our lower bound still applies for r = 2, 3 and IT = 1, but using girth -6 
graphs B that are known to exists with m = edges. There are several constructions, based on 

finite projective geometries, of graphs with edges and girth at least 6 (see for instance [64]). 

So, Theorem A.l can also prove the Q.{y/n) lower bound for r = 2, 3 and IT = 1. The case of larger r 
can be captured by the more general lower bound of [32], that uses a subdivision technique, and shows 
that VL{yJn/{r + 1 )) bit labels are required for any r >2. 
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Graphs 

r = 0, IT > 1 

r = 1,IT > 2 

r = 0, IT = 1 r = {g — 2)IT — 1 

General 

\{n - l)log + l] 

\{n - l)log + IJ 

i(n - 1) 

Bipartite 

InlogLf^ + IJ 

in log Lf+ IJ 

\n 


Table 4; Lower bounds derived from Theorem A.l. For “general graphs” we use the family 3^^r{Kn), 
where denotes the complete graph on n vertices, so m = n(n — l)/2 and g = 3. For “Bipartite 
graphs” we use the family Tw(A'„/ 2 ,n/ 2 )) where iLn/ 2 ,n /2 denotes the complete bipartite graph on n 
vertices (assuming n even) so m = n^/4 and g = A. Note that the case r = IT = 1 and the case 
r = 2W — 1 is captured by the last column of the last line, and so the lower bound is n/4. 

B Constructing micro trees 

Lemma B.l. Let k be a positive integer. Every m-edge tree has an edge partition into at most Im/k] 
trees of at most 2k edges. 

Proof. Consider a tree T with m edges. If T has fewer than k edges, then the partition is T itself and 
we are done. 

Otherwise, we will construct a subtree A oi T with at least k and at most 2k edges such that T — A 
is still connected. (By T — Awe mean the forest induced by all the edges in E[T) — E{A).) Once such 
an A is constructed, we can repeat the process on the remaining tree T — A until having a tree with 
less than k edges. Since each subtree A has at least k edges, the process stop after we have constructed 
at most Im/k] trees. □ 

C Approximate distances 

C.l Approximation using fewer nodes 

Lemma C.l. Given a graph G with n nodes, edge weights in [1, IT] and a rooted spanning tree T, we 
can, for integers k > 0, construct a tree T{k) whose node set is a subset of T and with the following 
properties. 

• infe)i<i + ^- 

• For any node v G Tik)*, distciv,parentT(k){v)) < (k + 1)IT. 

• For any node w G G, there exists a node v G T{k) with distG(u,rc) < kW. 

Proof. Partition the nodes in T into k + 1 equivalence classes according to their depth in T modulo 
A: + 1. One of these equivalence classes must contain [n/(A: + 1)J or fewer nodes. Select such a subset 
of nodes and denote it T{k). Also include the root of T in T{k), giving that \T{k)\ < 1 + n/{k + 1). In 
T{h) construct an edge between two nodes iff no other nodes from T{k) are on the simple path between 
the nodes in T. Then, for any v G T{k)*, distG'(u,parent'r(;.)(u)) < {k + 1)IT since the number of edges 
between v and parent-p^^.) (u) in T is at most k + 1. Similarly, for any w G T, its nearest ancestor v, 
which also in T{k) (could be w itself) is at most k edges up in T, giving distG'(u,rt;) < A:IT □ 

Roughly speaking, the approximation scheme presented here applies the previous techniques to 
create an exact distance labeling scheme for T{k), which has fewer nodes but larger weights between 
adjacent nodes. For a node x that is not in T{k), we find its nearest ancestor x' in T[k) and give x the 
same label as x'. We then approximate distG(a;, y) by 2kW + distc(a;^ yO• This will at most give us 
an error in [0,4A:IT], meaning that we now have a (4/cIT)-additive labeling scheme with labels of size 
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2(fc+i) log(^(fe + 1)M^ + 1), ignoring second order terms. It is possible to optimize this approach and 
obtain a (2/cII^)-additive scheme, by only using approximate distance for either x or y to their nearest 
ancestors x' or y'. Here, we show how to do it for the heavy path approach. A similar result holds for 
the micro tree approach. 

As described in Section 4, the label l{x) includes the sublabels It{x) and £^(x) using 0(logn • 
log(nlH)) bits. Our new label for approximate distance will include those sublabels as well. For a node 
u in T let v' be its nearest ancestor in T{k). In x’s label we also save irix') and £'j<{x'). In addition, 
we include a label containing the distances from x to w' for all w that appear in ^(x); this label 

also uses 0(logn • log(nlF)) bits. 

Now. if X or x' is an ancestor to y in T, we compute distG{x,y) as kW plus the distance in T from 
X or x' to y. This will at most give an additive error of 2kW. Similarly for y and y' and x. Those 
computation can be done as explained in Section 4 using the labels defined so far. If we cannot compute 
the distance in this way, then, consider the nearest common ancestor z of x and y in T. The node z' 
must then be the nearest common ancestor of x' and y' in T{k). Let Uk = \T{k)\. We will save [nfc/2j of 
the values (parent2’(fc)(u), u) for all v from T{k) with dfs(x') < dfs(u) < dfs(x') + [nfc/2j (mod n^). As 
(5a;(parent'r(;j)(u), u) G [—{k+l)W,{k+l)W] we can encode all these d-values using |'^nfclog/((/c + 1)IF)] 
bits. We can now compute distG'(a:,y) = kW + distG(x,z') + YlveT{k){z' y'] , v), where 

distG(a:, z') can be computed from It{x). This will at most give an additive error of 2kW. We have 
now proved the following theorem. 

Theorem C.2. There exists a {2kW)-additive distance labeling scheme for graphs with n nodes and 
edge weights in [1,VF] with labels of size ^p^^^nlog/((/c + 1)IF) + 0(logn • log(nlF)). 


C.2 Approximation using fewer weights 

With edge weights in [1, W] we have been encoding numbers from the interval I = [—IF, IF] of size 
|/| = 2IF + 1 using ^nlog(2IF + 1) bits. The theorem below uses an approximation technique where we 
use integers from a smaller set I' C /, which will reduce the space consumption but introduce an error 
when computing d^j-values. As we shall see, the error can be capped even when we are summing many 
dx-values. 


Theorem C.3. For any D < 2IF — 1 there exists a 


D 


2W-D 


-additive distance labeling scheme for 


graphs with edge weights in [1, IF] using labels of size ^nlog(2IF + 1 — D) 0{logn ■ log(nlF)) bits. 

Proof. Let us create a subset I' F I with \I'\ = \I\ — D. In we always include the maximum and 
minimum from /, and hence we require D <\I\ —2 = 2IF — I. In addition we minimize the maximum 
number Q of consecutive numbers from I — I'. Hence, for ii G I' (excluding maximum) there exists 
a number i 2 G I' such that Z2 < + Q + I. Since ]/'] = \I\ — D = 2IF + 1 — D, we have 2IF — D 

pair of neighbors in where we by “neighbors” mean two numbers in I' with no other number from I' 
between them. By equally spreading the D missing numbers between the 2IF — D pairs, we can obtain 


Q = 


D 


2W-D 


By substituting values in I' for values in /, we can now encode t values from I with 


[tlog(2IF + 1 — Df] bits. This will introduce an error, but in the case of d^-values, the accumulated 
error can be kept below Q as described below. 

Let T be a tree and consider the d^j-values (Ia;(u) = (parent(u), u) for nodes v G T*. Each d^j-value is 
a number in / = [—IF, IF]. We will visit the nodes top down starting from (but not including) the root r 
and assigning to each node y a new approximate value: Sx{y) G I' . (For the root r we implicitly associate 
the value 0. Implicitly, since it may not be a value in I' .) Recall that 5x{r,y) = YlveT[yr) ^xiv), and 
define 5x{r,y) = Y^x&T[y,r) ^xiv). We will assign the values such that 5x{r,y) < 5x{r,y) < 6x{r,y) + Q. 

For y G T*, let A{y) = 5x{r,y) — 5x{r,y). We prove by induction that A{y) G [0,(3]. So assume 
inductively that A(parent(y)) G [0,Q]. If 5x{y) G F, we can define 5x{y) = Sx{y), and we then have 
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A{y) = y4(parent(y)) G [0,(5] as desired. If this is not the case, let ii,i 2 G I' be the largest and 
smallest numbers from I', respectively, with ii < 5x{y) < 12 - By assumption, i 2 — ii < Q + 1. If 
A(parent(y)) + ii — 6x{y) > 0, we can set 5x{y) = h and obtain A{y) G [0,(5]. If not, then we must 
have ^(parent(y)) + i 2 — Sx{y) < h — h < (5 + I, so we can set Sx{y) = 12 and obtain A{y) G [0,(5]. 
This concludes the theorem. 

Above we have been changing all d^j-values top-down from the root. In the constant time solution, 
we could instead change the values top-down for each micro tree Tj, keeping exact distances to the root 
and in the macro tree. □ 


C.3 Final approximation 


We can combine the above two approaches by, for a /c > 0, first using Appendix C.l to obtain a (2A:IT)- 
additive distance labeling scheme with labels of size 2 (^+ 1 ) log/((^ + + 0{logn ■ log(nlT)). The 

approximate scheme will use edge weights in [l,[k + 1)W] to which then can apply the technique from 
Appendix C.2, finally getting: 


Theorem C.4. For any k > 0 and D < 2{k + 1)IT — 1 there exists a {2kW + 
additive distance labeling scheme for graphs with n nodes and edge weights in [1,IF] 
size log(2(/c -|- 1)IT -|- 1 — Z?) -|- (9(logn • log(nlT)) bits. 


D \ 
2{k+l)W-D ) 

using labels of 
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